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1 Poster papers - short papers: Auto-generation of topic hierarchies for web images from users 1 100% 
perspectives 

Pu-Jen Cheng , Lee-Feng Chien 

Proceedings of the twelfth international conference on Information and knowledge 
management November 2003 

In this paper, we propose an approach to automatically generating a Yahoo! -like topic 
hierarchy for organizing Web images from users' perspectives. Relatively little effort has been 
devoted towards providing such a taxonomy simultaneously considering users' image requests 
for semantic and visual information. Based on the characteristic that a Web-image query may 
be refined by various attributes, the proposed approach hierarchically groups similar queries 
from search engine logs into topic classe ... 

2 Latent dirichlet allocation 100% 
g] David M. Blei , Andrew Y. Ng , Michael I. Jordan 

The Journal of Machine Learning Research March 2003 
Volume 3 

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections 
of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in 
which each item of a collection is modeled as a finite mixture over an underlying set of topics. 
Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic 
probabilities. In the context of text modeling, the topic probabilities provide an explicit 
representation of a document. ... 

3 Similarity querying II: Using sets of feature vectors for similarity search on voxelized CAD 100% 
gj objects 

Hans-Peter Kriegel , Stefan Brecheisen , Peer Kroger , Martin Pfeifle , Matthias Schubert 
Proceedings of the 2003 ACM SIGMOD international conference on on Management of 
data June 2003 

In modern application domains such as multimedia, molecular biology and medical imaging, 
similarity search in database systems is becoming an increasingly important task. Especially 
for CAD applications, suitable similarity models can help to reduce the cost of developing and 
producing new parts by maximizing the reuse of existing parts. Most of the existing similarity 
models are based on feature vectors. In this paper, we shortly review three models which 
pursue this paradigm. Based on the most ... 
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4 Spatial indexing of high-dimensional data based on relative approximation 100% 
□j Yasushi Sakurai , Masatoshi Yoshikawa , Shunsuke Uemura , Haruhiko Kojima 

The VLDB Journai — The International Journai on Very Large Data Bases 

October 2002 

Volume 1 1 Issue 2 

We propose a novel index structure, the A-tree (approximation tree), for similarity searches in 
high-dimensional data. The basic idea of the A-tree is the introduction of virtual bounding 
rectangles (VBRs) which contain and approximate MBRs or data objects. VBRs can be 
represented quite compactly and thus affect the tree configuration both quantitatively and 
qualitatively. First, since tree nodes can contain a large number of VBR entries, fanout 
becomes large, which increases search speed. More ... 



5 Best Paper: Early experiences with a 3D model search engine 100% 
g| Patrick Mm , John A. Halderman , Michael Kazhdan , Thomas A. Funkhouser 

Proceeding of the eighth international conference on 3D web technology March 2003 
New acquisition and modeling tools make it easier to create 3D models, and affordable and 
powerful graphics hardware makes it easier to use them. As a result, the number of 3D models 
available on the web is increasing rapidly. However, it is still not as easy to find 3D models as 
it is to find, for example, text documents and images. What is needed is a \3D model search 
engine," a specialized search engine that targets 3D models. We created a prototype 3D model 
search engine to investigate the d ... 



6 Non-hierarchical document clustering using the ICL distribution array processor 100% 
g) E. Rasmussen , P. Willett 

Proceedings of the 10th annual international ACM SIGIR conference on Research and 

development in information retrieval November 1987 

This paper considers the suitability and efficiency of a highly parallel computer, the ICL 
Distributed Array Processor (DAP), for document clustering. Algorithms are described for the 
implementation of the single-pass and reallocation clustering methods on the DAP and on a 
conventional mainframe computer. These methods are used to classify the Cranfield, Vaswani 
and UKCIS document test collections. The results suggest that the parallel architecture of the 
DAP is not well suited to the varia ... 



7 Generation and search of clustered files 100% 
G. Salton , A. Wong 

ACM Transactions on Database Systems (TODS) December 1978 
Volume 3 Issue 4 

A classified, or clustered file is one where related, or similar records are grouped into classes, 
or clusters of items in such a way that all items within a cluster are jointly retrievable. 
Clustered files are easily adapted to broad and narrow search strategies, and simple file 
updating methods are available. An inexpensive file clustering method applicable to large files 
is given together with appropriate file search methods. An abstract model is then introduced to 
predict the retrieval ... 



8 Partitioning-based standard-cell global placement with an exact objective 100% 
g| Dennis J.-H. Huang , Andrew B. Kahng 

Proceedings of the 1997 international symposium on Physical design April 1997 

9 An investigation into coupling measures for C++ 100% 
Q) Lionel Briand , Prem Devanbu , Walcelio Melo 

Proceedings of the 19th international conference on Software engineering May 1997 
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10 Organization of clustered files for consecutive retrieval 100% 
rjj J S. Deogun , V V. Raghavan , T K.W. Tsou 

ACM Transactions on Database Systems (TODS) December 1 984 

Volume 9 Issue 4 r-i x T 

This paper studies the problem of storing single-level and multilevel clustered files. Necessary 
and sufficient conditions for a single-level clustered file to have the consecutive retneval 
property (CRP) are developed. A linear time algorithm to test the CRP for a given clustered 
file and to identify the proper arrangement of objects, if CRP exists, is presented. For the 
single-level clustered files that do not have CRP, it is shown that the problem of identifying a 
storage organization w ... 

11 On the reuse of past optimal queries 100% 
□ft Vijay V. Raghavan , Hayn Sever 

^ Proceedings of the 18th annual international ACM SIGIR conference on Research and 
development in information retrieval July 1995 

12 Statistical inference of unknown attribute values in databases 100% 
a Wen-Chi Hou , Zhongyang Zhang , Nong Zhou 

Proceedings of the second international conference on Information and knowledge 
management December 1993 
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Data clustering: a review 

A K. Jain , M. N. Murty , P. J. Flynn 
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££££ — conference on Multimedia September 

1998 
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13 Similarity querying II: Using sets of feature vectors for similarity search 77% 

3 IK^I- , Peer Kro 9 er , Martin Pfeifle Matthias Schubert 
Sroceedfngs of the 2003 ACM SIGMOD international conference on on 

coX of d'evdopmg and producing new parts by maximizing ^\ r ^[^f2r Ve' 
Most of the existing similarity models are based on feature vectors. In this paper, we 
shortly Review three models which pursue this paradigm. Based on the most ... 

14 Clustering algorithms: FREM: fast and robust EM clustering for large 77% 
gj data sets 
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SS^SSr-f ST ele^ conference on Information and 

™ art.de presents an improved 
EM algorithm to cluster large data sets having high dimensionality, noise and zero 
variance pro ems. The algorithm incorporates improvements to increase the quality of 
solutions and speed. In general the algorithm can find a good clustering solution , in ,3 
scans over the data set. Alternatively, it can be run until it converges. The algorithm 
has a few parameters that are easy to set and have defaults for most ca ... 

15 Searching in metric spaces 

a Edgar Chavez , Gonzalo Navarro , Ricardo Baeza-Yates , Jose Luis Marroqu.n 
ACM Computing Surveys (CSUR) September 2001 

V0 ' The prVblemof searching the elements of a set that are close to a given query 
element under some similarity criterion has a vast number of applications in many 
branches of computer science, from pattern recognition to textual and multimedia 
inform^ation retrieval. We are interested in the rather general case where the similarity 
SerTon defines a metric space, instead of the more restricted case of a vector space. 
Many solutions have been proposed in different areas, in many cases without cros ... 

16 A Document Storage Method Based on Polarized Distance 

aR. T. Chien , E. A. Mark 
Journal of the ACM (JACM) April 1974 

V0 ' S^mfelem^ary mathematical properties of term matching document retrieval 
svstems are developed. These properties are used as a basis for a new file 
oraanTzation technique. Some of the advantages of this new method are (1) the key- 

adTrfss^ransformation is easily determined; (2) the documentary .information is 
sfored only once in the file; (3) the file organization allows the use of vanous matching 
functions and thresholds; and (4) the dimensionality of the transform is easily 
expanded ... 
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1 Data clustering: a review 



a A. K. Jain , M. N. Murty , P. J. Flynn 
ACM Computing Surveys (CSUR) September 1999 



Volume 31 Issue 3 

Clustering is the unsupervised classification of patterns (observations, data items or 
feature vectors) into groups (clusters). The clustering problem has been addressed in 
many contexts and by researchers in many disciplines; this reflects its broad appeal 
and usefulness as one of the steps in exploratory data analysis. However, clustering is 
a difficult problem combinatorially, and differences in assumptions and contexts in 
different communities has made the transfer of useful generic co ... 

93% 

2 Model-based recognition in robot vision 

a Roland T. Chin , Charles R. Dyer 
ACM Computing Surveys (CSUR) March 1986 

Volume 18 Issue 1 J , . . ... 

This paper presents a comparative study and survey of model-based object- 
recognition algorithms for robot vision. The goal of these algorithms is to recognize the 
identity, position, and orientation of randomly oriented industrial parts. In one form 
this is commonly referred to as the "bin-picking" problem, in which the parts to be 
recognized are presented in a jumbled bin. The paper is organized according to 2-D, 
2V2-D, and 3-D object representations, which are used as the basis for ... 



3 An optimal algorithm for approximate nearest neighbor searching 

aSunil Arya , David M. Mount , Nathan S. Netanyahu , Ruth Silverman , Angela Wu 
Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms 

January 1994 

4 Making faces 

gj Brian Guenter , Cindy Grimm , Daniel Wood , Henrique Malvar , Frednc Pighin 
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Proceedings of the 25th annual conference on Computer graphics and interactive 
techniques July 1998 

5 Surveillance: Invariance in motion analysis of videos 85% 

aCen Rao , Mubarak Shah , Tanveer Syeda-Mahmood 
Proceedings of the eleventh ACM international conference on Multimedia 

November 2003 

In this paper, we propose an approach that retrieves motion of objects from the videos 
based on the dynamic time warping of view invariant characteristics. The motion is 
represented as a sequence of dynamic instants and intervals, which are automatically 
computed using the spatiotemporal curvature of the trajectory of moving object in the 
videos. Dynamic Time Warping (DTW) method matches trajectories using a view 
invariant similarity measure. Our system is able to incrementally learn different a ... 



a 



Three-dimensional object recognition 83 /o 

Paul J. Besl , Ramesh C. Jain 
ACM Computing Surveys (CSUR) March 1985 
Volume 17 Issue 1 

A general-purpose computer vision system must be capable of recognizing three- 
dimensional (3-D) objects. This paper proposes a precise definition of the 3-D object 
recognition problem, discusses basic concepts associated with this problem, and 
reviews the relevant literature. Because range images (or depth maps) are often used 
as sensor input instead of intensity images, techniques for obtaining, processing, and 
characterizing range data are also surveyed. 

7 Adaptation/load balancing: A method for decentralized clustering in 82% 
R| large multi-agent systems 

Elth Ogston , Benno Overeinder , Maarten van Steen , Frances Brazier 

Proceedings of the second international joint conference on Autonomous agents 

and multiagent systems July 2003 

This paper examines a method of clustering within a fully decentralized multi-agent 
system. Our goal is to group agents with similar objectives or data, as is done in 
traditional clustering. However, we add the additional constraint that agents must 
remain in place on a network, instead of first being collected into a centralized 
database. To do this we connect agents in a random network and have them search in 
a peer-to-peer fashion for other similar agents. We thus aim to tackle the basic clus ... 



8 WALRUS: a similarity retrieval algorithm for image databases 82% 

aApostol Natsev , Rajeev Rastogi , Kyuseok Shim 
ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD international 
conference on Management of data June 1999 
Volume 28 Issue 2 

Traditional approaches for content-based image querying typically compute a single 
signature for each image based on color histograms, texture, wavelet tranforms etc., 
and return as the query result, images whose signatures are closest to the signature of 
the query image. Therefore, most traditional methods break down when images 
contain similar objects that are scaled differently or at different locations, or only 
certain regions of the image match. In this pape ... 

9 Session 9: image indexing and retrieval: DynDex: a dynamic and non- 80% 
U metric space indexer 

King-Shy Goh , Beitao Li , Edward Chang 
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Proceedings of the tenth ACM international conference on Multimedia December 

2 °°To date, almost ali research work in the Content-Based Image Retrieval (CBIR) 
community has used Minkowski-like functions to measure similarity between images. 
In this paper, we first present a non-metric distance function, dynamic partial function 
(DPF) which works significantly better than Minkowski-like functions for measuring 
perceptual similarity; and we explain DPF's link to similarity theories in cognitive 
science. We then propose DynDex, an indexing method that deals with both the 
dynam ... 

10 Approximation of protein structure for fast similarity measures 80 % 

a Fabian Schwarzer , Itay Lotan 
Proceedings of the seventh annual international conference on Computational 

molecular biology April 2003 

It is shown that structural similarity between proteins can be decided well with much 
less information than what is used in common similarity measures. The full Co 
representation contains redundant information because of the inherent chain topology 
of proteins and a limit on their compactness due to excluded volume. A wavelet 
analysis on random chains and proteins justifies approximating subchains by their 
centers of mass. For not too compact chain-like structures in general, and ... 

11 Iterative refinement by relevance feedback in content- based digital 

H) image retrieval 

M. E. J. Wood , B. T. Thomas , N. W. Campbell t 
Proceedings of the sixth ACM international conference on Multimedia September 

1998 

12 Data streams I: Clustering binary data streams with K-means 

a Carlos Ordonez ^ . , 

Proceedings of the 8th ACM SIGMOD workshop on Research issues in data 
mining and knowledge discovery June 2003 

Clustering data streams is an interesting Data Mining problem. This article presents 
three variants of the K-means algorithm to cluster binary data streams. The variants 
include On-line K-means, Scalable K-means, and Incremental K-means, a proposed 
variant introduced that finds higher quality solutions in less time. Higher quality of 
solutions are obtained with a mean-based initialization and incremental learning. The 
speedup is achieved through a simplified set of sufficient statistics and oper ... 

13 Similarity querying II: Using sets of feature vectors for similarity search 77% 

Hi on voxel ized CAD objects 

Hans-Peter Kriegel , Stefan Brecheisen , Peer Kroger , Martin Pfeifle , Matthias Schubert 
Proceedings of the 2003 ACM SIGMOD international conference on on 
Management of data June 2003 

In modern application domains such as multimedia, molecular biology and medical 
imaging, similarity search in database systems is becoming an increasingly important 
task Especially for CAD applications, suitable similarity models can help to reduce the 
cost of developing and producing new parts by maximizing the reuse of existing parts. 
Most of the existing similarity models are based on feature vectors. In this paper, we 
shortly review three models which pursue this paradigm. Based on the most ... 
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""^l^TaS ™ s artWe presents an 'H mproved 

has a °ew parameters that are easy to set and have defaults for most ca ... 

£ !£?2&?£E£EE?. «-* Baez,v r , ,ose Lu,s M arro,u,n 
^ ACM Computing Surveys (CSUR) September 2001 

mmmsk 

16 A Document Storage Method Based on Polarized Distance 

3r, T. Chien , E. A. Mark 
Journal of the ACM (3ACM) April 1974 

^TomeVement'ary mathematica. P™P^oft^ m^^^^' 

SSSWSSffi-S (4) the dimensional of the transform ,s eas„y 
expanded ... 
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